R for Spatial Statistics

 

Maps of Uncertinaty

The techniques in this section allow us to create maps of where our uncertinaty may be higher or lower. The most straight forward is a map that shows the standard deviation of our output results after a series of MC runs. The approach is to:

  1. Create a loop that creates a model and a model prediction with a random component. The random component could be cross-validation, noise injection, sensitivity testing, or some combination of all of these.
  2. On each run, save the predicted values into a vector (technically it will be a vector of vectors)
  3. After the runs are completed, compute the standard deviation of the predicted.
  4. Use the standard deviation of the predicted values to create the map as you would for any other prediction values.

Note: are we modeling the variation in the outputs or in the residuals?

Basic Method

The solution below will work for creating uncertianty maps in most situations until we use cross-validation and are using the original data for the prediction. The next section shows an approach for cross validation.

This approach uses a matrix to capture all the predicted values. Each time a model run is completed, the predicted values are added as a column to the matrix. After all runs are finished, the standard deviation of each row can be computed to create a vector that can be added back into a dataframe and then used to create an uncertianty map. This provides a map of the standard deviation of the predictors.

The first step is to include the 'matrixStats' library which includes a number of functions for matrixes including one to compute the standard deviation of the rows in a matrix.

# matrixStats includes a function to compute the std dev of a row
require(matrixStats) 

Next, create an empty matrix that has the same number of rows as the dataframe

# create an empty matrix 
PredictionMatrix=matrix(, nrow = NumRowsInDataFrame, ncol = 0) 

Then, inside your loop, compute the residuals vector for all rows in your dataframe and add it to your residuals matrix.

# add the predicted values for this run as a column to the matrix 
PredictionMatrix=cbind(PredictionMatrix,PredictedValues) 

Finally, after the loop, compute the standard deviations for the rows in the matrix. This will result in a vector that can be added to the dataframe and then used to create a map. Note that matrixStats also includes functions for computing means and other matrix values that can be used to evalute models.

# get the standard deviation for each row in the predicted values matrix 
PredictionStdDevs=rowSds(PredictionMatrix) 

# get the mean for each row in the predicted values matrix
PredictionMeans=rowMeans(PredictionMatrix) 

Creating Uncertianty Maps When Performing Cross-Validation and Predicting with the Original Data Set

There are times when we will have a continous response varaible and we are preidcting values on the same data set that we used to create the mdoel. These are rare but would include when we have a biomass or land cover data set, created and model of biomass, and then ran it in future senarios.

The approach below uses k-fold cross-validation to find the residuals and then create an uncertinaty map.

#######################################################################
# code to perform k-fold cross validation to create an uncertianty map

RandomCount=1

TheData=data.frame(X1s,X2s)
TheData=data.frame(TheData,Measures)

TheData$ID=seq(1:nrow(TheData)) # add a column with ids for the original rows
TheData2 <- TheData[sample(nrow(TheData)),] # shuffle the data
TheFolds <- cut(seq(1,nrow(TheData2)),breaks=10,labels=FALSE) # divide the data into 10 buckets

ResidualsMatrix=matrix(, nrow = nrow(TheData2), ncol = 1) # create an empty matrix for the residuals

FoldCount=1
while (FoldCount<=10)
{
  # split the data into test and training
  testIndexes <- which(TheFolds==FoldCount) # get the indexes to the next fold in the data
  testData <- TheData2[testIndexes, ] # used the testindexes to sub-sample the data into a test dataset
  trainData <- TheData2[-testIndexes, ] # use the training indexes to sub-sample the data into a training dataset
  
  # Create a GAM based on the training data
  attach(trainData)
  TheModel=gam(Measures~s(X1s),data=trainData,gamma=1.4) 
  detach("trainData")
  
  # create the preidction using the test data
  attach(testData)
  TestPrediction=predict.gam(TheModel,testData,type='response')
  detach("testData")
  
  # compute the residuals
  Residuals=TestPrediction=testData$Measures
  
  # get the original indexes for the test data to update the residuals
  OriginalIndexes=TheData2$ID[testIndexes]
  
  # go through the residuals adding them into the residuals matrix 
  i=1
  NumRows=length(Residuals)
  while (i<=NumRows)
  {
    OriginalIndex=OriginalIndexes[i] # get the original index for this data point
    
    Value=Residuals[i] # get the Residual value
     
    #Value=unname(Value) # R adds a name to the predicted value in this case we have to remove it
    
    ResidualsMatrix[OriginalIndex,RandomCount]=Value # set the value into the residuals matrix
    
    i=i+1
  }
  FoldCount=FoldCount+1
}

ResidualsStdDevs=rowSds(ResidualsMatrix) # get the standard deviation for each row in the redisduals matrix
TheData2$ResSD=ResidualsStdDevs